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Abstract: Risks arising from the effect of disruptions and unsustainable practices constantly push 
the supply chain to uncompetitive positions. A smart production planning and control process 
must successfully address both risks by reducing them, thereby strengthening supply chain (SC) 
resilience and its ability to survive in the long term. On the one hand, the antidisruptive potential 
and the inherent sustainability implications of the zero-defect manufacturing (ZDM) management 
model should be highlighted. On the other hand, the digitization and virtualization of processes by 
Industry 4.0 (14.0) digital technologies, namely digital twin (DT) technology, enable new simulation 
and optimization methods, especially in combination with machine learning (ML) procedures. This 
paper reviews the state of the art and proposes a ZDM strategy-based conceptual framework that 
models, optimizes and simulates the master production schedule (MPS) problem to maximize service 
levels in SCs. This conceptual framework will serve as a starting point for developing new MPS 
optimization models and algorithms in supply chain 4.0 (SC4.0) environments. 


Keywords: supply chain 4.0; master production schedule; zero-defect manufacturing; digital twin; 
machine learning 


1. Introduction 


Since artificial intelligence began to make its way into almost all the sectors of today’s 
society, the adjectives intelligent or smart have become commonplace to describe a myriad 
of entities which are, in one way or another, endowed with the ability to react to changes 
in the environment to establish optimal operating conditions by themselves. We can find 
some examples in the industrial sector, such as intelligent software, intelligent systems, 
and intelligent agents, or smart grids, smart sensors, smart products, among others. For a 
supply chain 4.0 (SC4.0), understood as the supply chain (SC) that is reorganized by using 
the design principles and enabling technologies of the Industry 4.0 (14.0) spectrum [1], it 
seems appropriate to link the intelligent or smart attributes with SC abilities to overcome the 
risks that it faces and survive as the main proof of its capability to respond to challenging 
changes in the environment and to achieve optimal operating conditions. Along these 
lines, and regardless of whether causes are natural, economic, political or technological, 
disruption is the most significant risk that an SC faces in the short and mid terms. On a 
long-term horizon, lack of sustainability is one of the main risks for SC survival. So an 
SC4.0, such as a smart SC, should be resilient and sustainable. 

The effect of technological advances on industrial companies is indeed remarkable, 
and guides their development toward a production paradigm in which resilience and 
sustainability emerge as decisive SC management elements for both the future occupa- 
tion of better market positions and survival purposes [2]. SCs’ digital transformation, 
experimented on its way toward SC4.0, can contribute to addressing those aspects that 
compromise resilience and sustainability from a more favorable position by using 14.0 
design principles and enabling technologies to mitigate the complexity and heterogeneity 
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imposed by the flow of materials, products and information through a means such as an SC, 
which consists of multiple stages and nodes that are largely discrete and isolated from one 
another [3]. Furthermore, the implementation of 14.0 principles and technologies to drive 
SCs toward SC4.0 can be considered a process of progressive improvement and a transfor- 
mation of SCs by managing the introduction of new technologies, and socio-environmental 
dimensions, using sustainability and resilience as the core of improvement [4]. Indeed, 
apart from the purely paradigmatic perspective of 14.0, two other relevant aspects can be 
distinguished in SC4.0: technological and sustainability-related implications. In 14.0, new 
information and communication technologies are mainly considered to address the rising 
complexity that represents current industrial contexts that are, therefore, directly related 
to digital transformation. SC4.0 also aims to integrate humans and the environment into 
future industrial systems, which implies sustainable transformation [5]. 

Among other possibilities in line with this, one way to digitally and sustainably 
transform SCs by placing an eye on the 14.0 paradigm is to promote the automation of SC 
production systems and processes as well as their ability to respond in real time to changes 
in environment, rapidly reacting to unforeseen situations and, thus, accomplishing their 
assignments [6]. It should be stated that this approach is twofold as: (i) this capability can 
be improved by intervening in the design of the involved systems and processes themselves 
by an endogenous or enabling strategy, but (ii) it can also be enhanced by acting on the 
environment and its potential to alter those systems and processes by an exogenous strat- 
egy that aims to mitigate the stochasticity of the milieu, which could also be described as 
antidisturbing [7]. The application of the aforementioned approach to different SC areas re- 
quires particularizing its implementation. In the production planning and control area that 
requirement is, in turn, extensible to its three decision-making levels; i.e., strategic, tactical 
and operational levels. Specifically at the tactical decision level, the procedures employed 
in the master production schedule (MPS) are also susceptible to be improved by applying 
14.0 enabling technologies and other management models to enhance SC resilience and 
sustainability. According to the Association for Supply Chain Management (APICS) [8], the 
MPS is an issue determined by the SC planning environment according to four important 
component elements: (i) its strategies (make to stock, make to order, engineer to order, 
configure to order, and assemble to order, among others.); (ii) the number and type of 
involved stakeholders (suppliers, warehousers, manufacturers, distributors, retailers); (iii) 
structure (hierarchy with its tiers and relations); (iv) the nature of activities (production, 
distribution and/or procurement). A transformation approach of MPS procedures that 
pursues higher levels of SC resilience and sustainability should, therefore, take into account 
its strategies, stakeholders, structure and activities. In this complex multiagent context, 
the digital twin (DT) [6,9-13] potential to simulate, optimize, predict, share and visualize 
data in real time is significant, and can be helpful to collaboratively assist the MPS from 
the above-described endogenous perspective, whereas zero-defect manufacturing (ZDM) 
can be used to support the exogenous one as it allows failures and defects during the 
production process to be minimized, mitigated or eliminated according to the “do things 
right the first time” philosophy [14-18]. ZDM can also be seen as a new standard from the 
sustainability perspective because addressing the minimization, mitigation or elimination 
of defective products and processes implies posing specific waste management [17,19]. 
This integral approach intends to address the MPS in this research. 

An additional difficulty to address the MPS arises from the required computational 
efficiency. At the tactical level, the MPS needs a temporo-spatial disintegration of cumula- 
tive planning targets and forecasts, along with the provision and forecasting of required 
resources. This procedure eventually becomes difficult and slows down as the number 
of considered resources, products and time periods increases [20-22] because feasible 
solutions exponentially increases space in relation to a growing number of nodes (elements 
containing a product, period and resource) in the system, which defines it as an NP-hard 
problem. Most classic modeling approaches (simulation methods, heuristics, metaheuris- 
tics, matheuristics) present computational limitations as the MPS problem dimension 
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grows, particularly if the MPS is posed as a multi-objective issue. These limitations can 
lead to unacceptable computational times for a decision support system (DSS) when this is 
expected to facilitate real-time decision making, especially when the intention is to provide 
it with a certain level of autonomy, just as the approach set out in this research calls for. The 
machine learning (ML) potential to tackle this situation is remarkable at any production 
planning and control decision level [23,24] and its application to the MPS problem should, 
therefore, be considered. Furthermore, the feasibility to formulate the MPS problem as 
a Markov decision process (MDP) [25-27] leads to the specific choice of reinforcement 
learning as the main suitable candidate among existing ML methodologies. 

Thus the combined use in the MPS process of (i) the DT enabling technology, (ii) the 
ZDM management model, and (iii) ML-based modeling approaches is particularly relevant 
because it can guide SC toward positions of greater resilience and sustainability and, for 
this reason, can be qualified as a smart approach by providing 3-fold and complementary 
assistance to SCs’ responsiveness to changes in their environment. Nevertheless, it must 
be stated that this joint perspective of the smart improvement in the MPS problem has 
not yet been addressed by the academic community as only one author in the currently 
existing literature provides a simple initial conceptual framework that coincides with the 
joint approach herein indicated. 

This paper presents an overview of the addressed topics from a joint perspective and, 
to bridge this knowledge gap, proposes an initial ML-based DT framework for automated 
MPS management in an SC4.0 context with a zero-defect characteristic, which we call smart 
MPS, to provide an answer to the following research questions: 


- | RQ1: What mechanisms can make the DT competent in assisting the MPS process 
from an enabling strategy? 

- RQ2: How can ML techniques help to overcome the difficulties that arise from the 
MPS problem’s computational efficiency? 

- | RQ3: How does the ZDM anti-disturbing strategy push MPS to achieve a more 
resilient and sustainable SC? 

-  RQ4: Can the DT technology, the ZDM management model and ML-based modelling 
approaches be considered conceptual complementary tools that support MPS and 
push it to higher resilience and sustainability levels? 


The rest of the paper is organized as follows. Section 2 first provides definitions of the 
main concepts included in this research and subsequently offers an overview of the related 
literature. Section 3 describes the conceptual proposal by defining an initial framework 
and presenting the setup of a smart MPS. Section 4 discusses the main implications of the 
proposal formulated by reviewing how the proposal responds to the research questions. 
Finally, Section 5 provides the main conclusions and further research. 


2. Literature Review 

The review of the selected literature was carried out in four stages: (i) a semantic 
introduction to the main involved concepts; (ii) a literature search; (iii) a thematic approach 
of the joint domain making up the selected literature; (iv) a content analysis to identify the 
main contributions from the perspective of this research. 


2.1. The Main Involved Concepts 


The introductory definitions of the main concepts employed by means of this research 
are provided in Table 1. 
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Table 1. Definitions of the main concepts. 


Concept 


Definitions 


Industry 4.0 
(Enabling context) 


14.0 stands for the fourth industrial revolution, which is 
defined as a new level of organization and control over 
the entire value chain of products’ life cycle. It is geared to 
increasingly individualized customer requirements [28]. A 
combination of digital technology with manufacturing 
transforms industrial production to the next level [29] the 
convergence of industrial production, information and 
communication technologies [30]. 


Supply chain 4.0 
(Target context) 


A transformational holistic approach to SC management 

that utilizes 14.0 disruptive technologies to streamline SC 
processes, activities and relations to generate significant 

strategic benefits for all the SC stakeholders [31]. SC4.0 is 
the SC created as a result of the new digital era brought 
forth by the fourth industrial revolution [32], 14.0. The 
reorganization of SCs-design and planning, production, 

distribution, consumption, reverse logistics—using 
technologies known as 14.0 [1]. 


Master production schedule 
(Research object) 


A line on the master schedule grid that reflects the 
anticipated built schedule of those items assigned to it, 
and one that represents the items that a company plans to 
produce and are expressed as specific configurations, 
quantities and dates [8]. The MPS is essential for 
maintaining customer service levels and stabilizing 
production planning in a material requirements planning 
(MRP) environment [33]. The MPS drives the MRP system 
and provides an important link between the forecasting, 
order entry, and production planning activities on the one 
hand, and the detailed planning and scheduling of 
components and raw materials on the other hand [34]. 


Digital twin 
(Research tool) 


A dynamic model in the virtual world that is fully 
consistent with its corresponding physical entity in the 
real world and can simulate its physical counterpart’s 
characteristics, behavior, life, and performance in a timely 
fashion [35]. A virtual model in the virtual space that is 
used to simulate the behavior and characteristics of the 
corresponding physical object in real time [36]. A virtual 
and computerized counterpart of a physical system that 
can exploit the real-time synchronization of the sensed 
data from the field and is closely linked with 14.0 [37]. 


Machine learning 
(Research tool) 


A computer program capable of learning from experience 
to improve a performance measure of a given task [38]. 
ML is an evolving branch of computational algorithms, 

designed to emulate human intelligence by learning from 

the surrounding environment [39]. ML is an artificial 
intelligence application that provides computers with the 
ability to automatically learn and improve from 
experience with no direct programming [40]. 


Zero-defect manufacturing 
(Research tool) 


A strategy whose goal is to decrease and mitigate failures 
in manufacturing processes and to do things right the first 
time [41]. A manufacturing strategy which, by assuming 
that errors and failures will always exist, focuses on 
minimising and detecting them online so that no 
production output deviates from specification advances to 
the next step [16]. ZDM consists of four strategies: 
detection, repair, prediction, prevention [42]. 
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Table 1. Cont. 


Concept Definitions 


The attribute that defines an artificial system’s behavior 
which, if a human behaves in the same way, is considered 
intelligent [43]. Intelligence assists decision making by 
converting raw business data into valuable and 
meaningful information and knowledge [44], and is 
supported by the development of advanced analytics and 
data visualization models, platforms and services that 
support decision-making processes [45]. Intelligence is a 
corporate capability to forecast change, regardless of it 
coming in the form of opportunity or threat, and in time to 
do something about it [46]. 


Intelligence 
(14.0 design principle) 


A set of conditions, qualities and abilities that allows a 
device or system to correctly perform a function when 
interacting with a real-world physical process that shares 
the same temporal constraints. In the SC context, this 
capability characterizes the way in which a given SC 
device or system successfully performs its function within 
the time frame that configures the process with which it 
interacts without altering the pace of its progress. This 
capability is one of the main concerns in an SC as it allows 
to speed up the elicitation of responses during decision 
making and, consequently, increases its efficiency [47]. 


Real-time action ability 
(14.0 design principle) 


Resilience is an SC’s capacity to persist, adapt or 
transform when faced with change from both engineering 
and social-ecological perspectives [48]. An SC’s adaptive 
capability is to prepare for and/or respond to disruptions, 

to make a timely and cost-effective recovery and to, 
Supply chain resilience therefore, progress to a post-disruption state of operations, 
(Expected effect) ideally a better state than that before the disruption [49]. 
SC resilience is the adaptive capability to prepare for 
unexpected events, respond to disruptions, and recover 
from them by maintaining the continuity of operations at 
the desired level of connectedness and control over both 
structure and function [50]. 


SC sustainability is the management of environmental, 
social and economic impacts, and the encouragement of 
good governance practices, throughout the life cycles of 

goods and services [51]. The extent to which the SC 
Supply chain sustainability organization’s decisions impact the future situation of the 
(Expected effect) natural environment, society and business viability [52]. A 
sustainable SC is one that includes measures of profit and 
loss, as well as social and environmental dimensions. Such 
conceptualization has been referred to as the sustainability 

triple dimension: financial, social, environmental [53]. 


2.2. Literature Search 


The SC is a conceptual realm that has been approached from many different angles 
with more than 50,000 entries in Scopus in the past decade alone. In an attempt to identify 
all those trends, Maryniak et al. [54] diagnose which the dominant SC topic areas are in the 
last three decades. However, hardly any literature has been identified that simultaneously 
addresses the MPS problem in the SC from the ZDM perspective and with the joint support 
of DT and ML technologies. Thus, in the Scopus database, the search instance TITLE-ABS- 
KEY ((“supply chain” OR “supply network”) AND (“master production” OR mps) AND 
(zdm OR “zero * defect”) AND “digital twin” AND (“machine learning” OR “artificial 
intelligence”)) returned only one result, which evidences a knowledge gap. For this reason, 
we further explored the existing literature in the individual knowledge domains MPS, 
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ZDM, DT and ML, applied specifically to the SC, which added 24 relevant papers to the 
aforementioned one (Table 2). 


Table 2. Relevant literature about MPS, DT, ML, and ZDM applied specifically to the SC. 


Author 


Tittle 


Chern et al., 2014 [55] 


Solving a Multi-Objective Master Planning Problem with 
Substitution and a Recycling Process for a Capacitated 
Multi-Commodity Supply Chain Network 


Grillo et al., 2015 [56] 


Application of Particle Swarm Optimisation with 
Backward Calculation to Solve a Fuzzy Multi-Objective 
Supply Chain Master Planning Model 


Sutthibutr and 
Chiadamrong, 2019 [57] 


Applied Fuzzy Multi-Objective with «-Cut Analysis for 
Optimizing Supply Chain Master Planning Problem 


Arani and Torabi, 2018 [58] 


Integrated Material-Financial Supply Chain Master 
Planning under Mixed Uncertainty 


Ghasemy et al., 2020 [59] 


Robust Master Planning of a Socially Responsible Supply 
Chain under Fuzzy-Stochastic Uncertainty (A Case Study 
of Clothing Industry) 


Martin et al., 2020 [60] 


Master Production Schedule Using Robust Optimization 
Approaches in an Automobile Second-Tier Supplier 


Peidro et al., 2012 [61] 


Fuzzy Multi-Objective Optimisation for Master Planning in 
a Ceramic Supply Chain 


Serrano et al., 2021b [18] 


Digital Twin for Supply Chain Master Planning in 
Zero-Defect Manufacturing 


Orozco-Romero et al., 2020 [62] 


The Use of Agent-Based Models Boosted by Digital Twins 
in the Supply Chain: A Literature Review 


Marmolejo-Saucedo et al., 2020 [12] 


Digital Twins in Supply Chain Management: A Brief 
Literature Review 


Barykin et al., 2020 [63] 


Concept for a Supply Chain Digital Twin 


Ivanov et al., 2019 [13] 


Digital Supply Chain Twins: Managing the Ripple Effect, 
Resilience, and Disruption Risks by Data-Driven 
Optimization, Simulation, and Visibility 


Ivanov and Das, 2020 [64] 


Coronavirus (COVID-19/SARS-CoV-2) and Supply Chain 
Resilience: A Research Note 


Dolgui et al., 2020 [65] 


Reconfigurable Supply Chain: The X-Network 


Park et al., 2021 [66] 


The Architectural Framework of a Cyber Physical Logistics 
System for Digital-Twin-Based Supply Chain Control 


Wang et al., 2020 [10] 


Digital Twin-Driven Supply Chain Planning 


Alves and Mateus, 2020 [67] 


Deep Reinforcement Learning and Optimization Approach 
for Multi-Echelon Supply Chain with Uncertain Demands 


Peng et al., 2019 [68] 


Deep Reinforcement Learning Approach for Capacitated 
Supply Chain Optimization under Demand Uncertainty 


Boute et al., 2021 [69] 


Deep reinforcement learning for inventory control: A road 
map. 


20 


Afridi et al., 2020 [70] 


A Deep Reinforcement Learning Approach for Optimal 
Replenishment Policy in A Vendor Managed Inventory 
Setting For Semiconductors 


21 


Kegenbekov and Jackson, 2021 [71] 


Adaptive supply chain: Demand-supply synchronization 
using deep reinforcement learning 


22 


Siddh et al., 2014 [72] 


Integrating Lean Six Sigma and Supply Chain Approach 
for Quality and Business Performance 


23 


Pardamean and Wibisono, 2019 [73] 


A framework for the Impact of Lean Six Sigma on Supply 
Chain Performance in Manufacturing Companies 
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Table 2. Cont. 


Author Tittle 
24 Poornachandrika and Quality Transformation to Improve Customer Satisfaction: 
Venkatasudhakar, 2020 [74] Using Product, Process, System and Behavior Model 


Change Management for Sustainability: Evaluating the 
25 Thakur and Mangla, 2019 [75] Role of Human, Operational and Technological Factors in 
Leading Indian Firms in Home Appliances Sector 


Given the special relevance of some of the involved concepts, such as: (i) 14.0, for repre- 
senting the enabling context; (ii) SC4.0, for representing the target context; (iii) intelligence 
and real-time action ability, for constituting the main 14.0 design principles implied in the 
proposed research line; (iv) resilience and sustainability, for being the ultimate expected 
effects of applying the DT-ML-ZDM scheme in the MPS, the review of these 25 papers has 
also considered the treatment given to all these concepts. 


2.3. Thematic Analysis 


The thematic analysis of the selected papers, carried out with the VOSviewer 1.6.16 
tool, shows (Figure 1) a first grouping of concepts around the main one, “supply chain”, 
which is connected to four other groupings: “digital twin”, “production plan”, “digital 
technology” and “supply chain management”. From the thematic map, and based on the 
co-occurrences in the text composed of the title and the abstract of each paper, we observe 
that: (i) the main group formed by “supply chain”, in addition to “organization”, is formed 
by “ZDM”, “sustainability” and “uncertainty”; (ii) the most closely related concept to 
“supply chain” is “digital twin”, which might reveal the importance that this technology 
has acquired in the SC field; (iii) “digital twin” and “supply chain planning” form a cluster, 
which shows the importance that the DT has in academia for SC planning processes; (iv) the 
“production plan” cluster is also formed by “CPS” (cyber-physical systems) and “agent”, 
which can place them as common tools for researchers in production planning; (v) the 
cluster headed by “digital technology” includes concepts such as “quality”, “simulation”, 
“ripple effect” and “resilience”, and this relation can show where digital technology draws 
useful attention or generates interest in the SC domain; (vi) the cluster headed by “supply 
chain management” also integrates “knowledge” and “future research”, which might be 
related to the shown interest in acquiring new knowledge into the SC domain that supports 
improvements to its management processes. 


2.4. Content Analysis 


Proposing new resolution and optimization models has been a recurrent approach in 
the specific SC literature segment that focuses on the MPS. Chern et al. [55] put forward a 
multi-objective MPS resolution model with a heuristic method based on a genetic algorithm 
called the GA-based master planning algorithm (GAMPA) to solve a MPS problem with 
multiple final products, substitutions and a recycling process with a stochastic pattern, 
which creates a loop in both the SC and product structure trees. Grillo et al. [56] use 
the fuzzy set theory to model uncertainty and propose a metaheuristic particle swarm 
optimization (PSO) technique as a solution method. A method to achieve an optimal MPS in 
an uncertain environment is that proposed by Sutthibutr and Chiadamrong [57]. It is based 
on a multi-objective linear fuzzy model with an «-cut analysis to ensure decision makers. 
The result satisfies their preferences based on a specified minimum allowed satisfaction 
value (a). Arani and Torabi [58] integrate physico-material tactical plans with financial ones 
to account for their reciprocal effects in a bi-objective mixed possibilistic-stochastic model 
for an SC master planning problem. Ghasemy et al. [59] propose a mixed integer nonlinear 
programming model with probabilistic constraints to determine centralized planning, 
viewed from the sustainability perspective under uncertainty. Here the sustainability 
aspect is reduced to a sustainable procurement planning addressed by appropriate supplier 
selection. Martin et al. [60] address the uncertain MPS problem for an automotive second- 
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tier supplier with two optimization approaches based on other authors’ research. Both 
were tested in a real automotive SC and compared to a deterministic approach. The MPS 
problem for a centralized SC of replenishment, production and distribution is tackled by 
Peidro et al. [61], who present a fuzzy multi-objective linear programming approach to 
model it. 


Gnana 
(Cowie) 
/ £ Gu 
Carauaionan) 
(anuon, Gn supply chain, 
come — (organization) 
Gay) cams 
(uncertainty. 

g VOSviewer 


Figure 1. Thematic map. 


Serrano et al. [18] propose an initial DT-based conceptual framework to model and 
simulate the MPS problem with a ZDM feature in the SC4.0 context. This is the only paper 
in the literature to address the focus of this research comprehensively, albeit with an initial 
descriptive approach. This framework focuses on creating an enabling space for solving 
optimization algorithms for the MPS problem based on applying deep reinforcement learn- 
ing (DRL) techniques. The framework is designed to accommodate the set of actors in the 
SC, along with their physical and virtual processes and resources in a collaborative manner. 
Its design aims to improve SC performance by reinforcing the digitization, intelligence, 
visibility, interconnectedness, organization and sustainability 14.0 attributes. This initial 
framework is restricted to the manufacturer and goes up to second-tier suppliers to narrow 
down the problem’s scope. 

According to Orozco-Romero et al. [62], the DT technology is a tool that enables both 
real-time digital monitoring and automatic decision making. Therefore, DTs are relevant 
tools when pursuing the goal of automating SC systems. Marmolejo-Saucedo et al. [12] 
review the scientific literature on DTs as one of the main 14.0 enabling technologies within 
the SC management realm. The association of DTs with SC visibility, and the possibility 
of planning and making real-time decisions, lead to better disruptive risk management 
and higher resilience levels. Along such lines, Barykin et al. [63] attribute the need to 
build DTs given SCs’ poor reliability and stability due to errors in their operation. They 
assert that DTs can generate information on the impact of such errors, and can influence 
SC performance by observing different scenarios that simulate the location of errors and 
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their duration, and to analyse recovery policies. All this leads to greater SC resilience. 
Ivanov et al. [13] explain the SC DT concept and propose a framework for risk management 
by analyzing perspectives and future transformations that can help to integrate resilience 
owing to the information provided by the DT. According to the authors’ paper, an SC 
DT is a model that can represent the network state for any given moment in time, and 
allows for complete end-to-end SC visibility to improve resilience and to test contingency 
plans, which is clearly aligned with the approach of this research by focusing on resilience 
and sustainability. The research by Ivanov and Das [64] is centred on SC resilience after 
disruptive events occurring as a result of the COVID-19 pandemic and how to optimally 
recover normalcy in an SC. It identifies the need to implement such a partnership to map 
supply networks and to ensure their visibility as a tool to recover from disruption, where 
the DT can play a significant role by taking the disruptive effect of the pandemic as an 
example. Dolgui et al. [65] propose reconfigurability as an SC parameter that characterizes 
the SC in an uncertain and changing environment. It does so by addressing the notion 
of a reconfigurable SC, or a X-network, by taking the DT as a basis for its design. Ina 
reconfigurable SC, the organization design at the network level must be shaped by 14.0, 
circular economy, industrial symbiosis and collaborative industry. In SCs, reconfigurability 
plays an important role in 14.0 design principles, such as intelligence, real-time action 
capability, flexibility and sustainability (the last of which comes in its three well-known 
dimensions), as well as enabling technologies such as the DT, in this specific case as SC 
DTs which, according to the authors, are computerized models representing the network 
state for any given moment in real time. SCs’ resilience to fluctuations in make-to-order 
SC environments in customized production cases is addressed by Park et al. [66], whose 
propose a logistics CPS, or CPLS, coordinated with agent cyber physical production systems 
(CPPS) in a multi-level cyber-physical system structure based on distributed DT simulation 
technology. Wang et al. [10] address the SC problem from a DT perspective by detailing its 
benefits and potential compared to other approaches: (i) with synchronization between the 
physical and virtual twin, the DT promotes faster action and response to reduce lead times; 
(ii) with dynamic and comprehensive data collection, the DT improves forecast accuracy; 
(iii) with high-quality modeling, the DT significantly improves planning verifications. Thus 
in the 14.0 and SC4.0 eras, the DT promotes demand forecasting, aggregate planning and 
inventory planning to be more analytical, reliable, efficient and quick to obtain, which all 
favor SC resilience. 

As for using ML techniques to support production planning and control problems 
in the SC domain, it is worth noting that most contributions focus on the operational 
decision level. Of those dealing with planning at the tactical decision level, most focus on 
either inventory replenishment or, to a lesser extent, dynamic supplier selection problems. 
Alves and Mateus [67] consider a DRL approach based on an improved version of the 
proximal policy optimization algorithm (PPO), called PPO2, to solve the inventory problem 
of a four-step SC with two nodes per step and stochastic demands. The optimization 
approach for Peng et al. [68] is similar, but the modeled problem considers a simpler SC 
composed of three stages—plant, plant warehouse and retailer—subject to independent, 
stochastic and seasonal demand. In it the adequate and stable supply of raw materials 
is assumed, but the plant’s production capacity is limited. The article of Boute et al. [69] 
offers a conceptual approach, and its objective is to describe the key design choices of 
DRL algorithms to facilitate their implementation into the inventory control task in SCs. 
It first introduces MDPs for inventory control optimization in their different solution 
approaches. Second, it describes the use of neural networks to solve MDPs, as well as the 
different methods that arise according to how the function of Bellman equations is used for 
the neural network design. After these theoretical introductions, the authors explain the 
procedure followed to develop DRL algorithms by providing a taxonomic analysis. The 
research by Afridi et al. [70] focuses on the environments of certain complex SCs, such as 
those in the semiconductor industry, where innovation cycles are short, production lead 
times are long and demand uncertainty is high. These operating conditions in SCs mean 
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that semiconductor manufacturers are particularly exposed to the undesired amplification 
of demand fluctuations within the chain, a phenomenon known as the bullwhip effect, 
which was described by Lee et al. [76]. In this context, the authors propose adopting 
a collaborative strategy known as vendor management inventory (VMI), in which the 
supplier takes control and full responsibility for replenishing the customer’s inventory 
by defining minimum and maximum inventory levels, and all supported by the deep 
Q-network (DQN) method. The authors consider a two-stage SC and model this problem 
as an MDP. Synchronization of SCs as a means to avoid the bullwhip effect in stochastic 
environments constitutes the central theme of the research by Kegenbekov and Jackson [71]. 
Indeed, an SC with synchronized stages and nodes can prevent the dynamics of cascading 
inventory increases and decreases that follow unanticipated fluctuations in demand, and 
to mitigate the bullwhip effect caused by operational errors. A DRL agent can perform 
the adaptive coordination needed to perform such synchronization, as long as end-to-end 
visibility in the SC is complete. As an MDP, the authors model a problem characterized 
by having a single-product, multi-stage, single-node-per-step SC environment in which a 
PPO agent has to choose how many products to order from all the SC agents in each step 
to, thus indirectly obtain local inventory levels. 

The application of the ZDM philosophy to the SC domain has also been a topic 
addressed by researchers, albeit sparsely. Most focus more on the quality management 
discipline than on production planning and control, and the zero-defect outcome comes 
about from indirectly applying other strategies or philosophies. For Siddh et al. [72], the 
objective is to integrate lean six sigma into SCs instead of ZDM, but the zero-defect outcome 
is indirectly achieved as an effect. Within the lean six sigma framework, the authors place 
a central idea: knowing how many defects the process has, systematically figuring out 
how to eliminate them is possible. This research does not address resilient SC properties 
and, as the authors state, the only mention made to the sustainability issue is through 
the 5S of lean manufacturing: sort, store, shine, standardize, sustain. Pardamean and 
Wibisono [73] propose a framework to explain the impact of six sigma on SC performance 
based on increasing process capability in the value stream by seeking zero defects and 
reducing process variation, which approximates to the aforementioned exogenous strategy 
to mitigate the milieu stochasticity, also described as an antidisturbing strategy, to thus 
favor the automation of SC production systems and processes, and the capability to re- 
spond in real time to changes in the environment. In this research, sustainable SCs’ logistics 
performance is assessed using three categories, namely sustainable supplier selection, sus- 
tainable production and sustainable delivery. Poornachandrika and Venkatasudhakar [74] 
present a behavioral process and a system model for achieving zero defects with a case 
study conducted in an automotive company. This article focuses mainly on the transforma- 
tion of quality within SCs. One of its main conclusions is that the elimination of human 
intervention in some processes improves results, which relates it to automation. Unlike the 
above authors, Thakur and Mangla [75] understand the zero-defects concept in the SC as 
one of the final effects of sustainable practices. 

Finally, it is worth noting that the relevant literature on MPS, DT, ML and ZDM 
applied specifically to the SC shows a common thread that should be highlighted here. 
Most articles present research results that, in one way or another, and to a greater or lesser 
extent, are based on some of the design principles and enabling technologies of 14.0 and, 
therefore, of SC4.0, from a positive perspective of both paradigms; in other words, from a 
position that assumes, as a valid axiom, that introducing 14.0 and SC4.0 into a context such 
as that of SCs only leads to positive effects. Some researchers argue that this is not really 
the case. Adopting 14.0 and SC4.0, in addition to opportunities, involves barriers and poses 
risks [77] that must be duly considered when addressing any digital transformation project 
in the SC, and which will depend largely on the selected digitization strategy and the core 
capabilities acquired by the SC by that strategy [78]. 

From the review, it can be concluded: (i) the existing literature on the MPS problem 
addressing the DT, ML and ZDM individually is abundant and varied, but the literature that 
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addresses the problem from a joint perspective is practically nonexistent; (ii) DT technology 
is considered by researchers an enabling tool to achieve higher efficiency and reliability 
levels by endowing SC systems with capabilities, such as decision-making automation, 
real-time response, end-to-end visibility or disruptive risk management; (iii) conceptual 
framework or model proposals based on the DRL-driven DT are very limited; (iv) using 
ML methods to support production planning in the SC domain is also a limited practice 
that centers mostly on DRL-based methods; (v) of all the DRL-based methods followed 
by the researchers in the SC planning domain, PPO implementations have become more 
prominent in the last 3 years, followed by DON algorithms, whose use is currently declining 
in favor of PPO and its variants, as previously indicated; (vi) the ZDM issue in the SC 
domain is still not approached as a per se strategy, but appears as an effect of applying other 
strategies, such as lean manufacturing, six sigma, or their merger lean six sigma, despite the 
remarkable and growing interest shown by researchers in applying ZDM to other planning 
domains, especially at the operational decision level. Only a couple of authors mention the 
potential of this strategy for the mitigation of disturbances that affect processes, which is 
so exploited in other planning contexts, especially in operational decision terms such as job 
scheduling and sequencing. 


3. Proposal 


The proposal of a conceptual framework for the smart MPS based on the DI-ML-ZDM 
scheme is formulated in the following five stages in this section: (i) alignment axes of the 
proposal with the 14.0 and SC4.0 paradigms; (ii) integrating the DT for the MPS into the SC 
context; (iii) integrating the physical and virtual environments of the DRL-based DT; (iv) 
description of the DRL-based agent’s learning and prescription processes; and, finally, (v) 
the proposal summary. 


3.1. Alignment Axes of the Proposal with 14.0 and SC4.0 


The proposal presented here is based on the general assumption that the environment 
on which it is developed has the characteristics of an SC4.0, i.e., an SC whose digital 
transformation is aligned with the design principles governing 14.0 and is carried out by 
using its enabling technologies. In this particular case, specifically some design principles 
of 14.0, such as flexibility, intelligence, integration, virtualization, interconnectedness, inter- 
operability, visibility, real-time action ability, energy efficiency and sustainability [79-82] 
play a relevant role directly or indirectly in the endeavor to confront SC complexity and 
heterogeneity toward more resilience and sustainability on the way toward SC4.0. The 
same applies for some of its enabling technologies, such as information and communication 
technologies (ICT), cyber-physical systems (CPS) or cyber-physical production systems 
(CPPS), the Internet of things (IoT) or the industrial IoT (MoT), smart enterprise resource 
planning (ERP), manufacturing execution system (MES), virtual reality, DT, ML algorithms, 
big data, cloud services or cloud manufacturing, semantic technologies and cybersecu- 
rity [79,81,83-85], which are involved in the design of the proposal, along with techniques 
such as modeling, simulation and optimization. 


3.2. Integrating the DT into the SC Context 


Within the conceptual framework that is herein proposed, the DT is firstly character- 
ized by virtually replicating the MPS, an operation also known as digital twinning [86]. 

Based partially on the research by [13,63,66], the proposed DT shapes the MPS as 
two different planes, the physical plane and the virtual one, as shown in Figure 2. In the 
physical DT plane, the MPS is determined by physical processes and resources, meaning 
data and information on the processes and resources from the actual SC environment. The 
main physical processes that determine the MPS are: (i) demand forecasting; (ii) receiving 
customer orders; (iii) planning processing; (iv) formalizing the intervening parties’ commit- 
ment to the MPS; (v) referring to suppliers about the MPS; (vi) controlling MPS evolution. 
As for the involved physical resources, the MPS is determined by: (i) manpower; (ii) pro- 
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ductive equipment; (iii) inventory; (iv) started production; (v) subcontracted quantities; (vi) 
capacity constraints; and (vii) time as a resource represented by the different milestones 
shaping and constraining the problem. The sources and communication systems of these 
data and information can vastly vary by taking into account the environment in question, 
characterized by the 14.0 and SC4.0 paradigms: CPS/CPPS, sensorization, the IoT /TloT, 
cloud manufacturing, smart ERP, MES, among other 14.0 enabling technologies. The data 
and information fed to the DT from any SC node must be automated and its real-time flow 
must be guaranteed. 
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Figure 2. Integrating the DT into SC context. Figure based on Serrano et al. [87]. 


In order to perform the analysis, simulation, optimization and prescription, the data 
and information from the physical SC environment must be replicated and processed 
virtually at two different levels: the backend or support level; the frontend or interface. 
The backend forms part of the DT development and is responsible for running the existing 
system logic behind the interface with the human operator. In the backend, the processes 
and resources data and information from the physical plane are translated into virtual 
processes and resources. The virtual processes that enable DT functioning in the backend 
are: (i) simulation in the virtual environment for agent training, based on historical data 
or the generation of synthetic scenarios; (ii) agent training in the virtual environment, a 
parallel and simultaneous process to the previous one; and (iii) agent prediction, herein 
called prescription, a process enabled by the successful completion of the training process. 
By virtual backend resources, we mean both the data and information related to the real 
plane elements, as well as those related to the formulation and modeling of the MPS 
problem, but they are all coded and combined in such a way that they can feed the above 
simulation, training and prescription processes based on the DRL method. This includes 
data and information from: (i) the MDP model of the MPS and the DRL model; (ii) demand; 
(iii) costs; (iv) lotification; (v) capacity; (vi) deadlines and periods; and (vii) possible 
policies. From these backend processes, data and information, the frontend, as an interface 
specially prepared to human users, automatically provides in real time the schedule that 
is currently prescribed by the agent and the necessary information about using resources. 
This information can also automatically feed other tactical decision level processes, such as 
MRP, inventory control or capacity requirements planning (CRP). 

The DT backend, and the MPS data and information contained therein, are elements 
that, in principle, belong to the manufacturer’s sphere and are not replicated for other 
SC stakeholders, i.e., suppliers, warehousers, retailers or, in some cases of customized 
manufacturing, even customers. Unlike the previous one, the DT frontend, whose repli- 
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cation scope extends beyond the manufacturer’s sphere (the centre of the SC within this 
framework), is shared with other SC stakeholders in a collaborative cloud-computing envi- 
ronment to provide end-to-end visibility to each SC actor and the possibility of real-time 
process synchronization to achieve: (i) greater SC enablement against unexpected demand 
fluctuations, which make it more resilient; and (ii) optimized use of resources by enabling 
inventory reduction, improved transportation efficiency, reduced energy use, a shorter lead 
time to, thus, lower costs, among other effects, that result in greater sustainability. 

Within this framework, the SC is understood as a single domain for all the intervening 
SC stakeholders, where each one uses personalized data and information blocks about 
the MPS with different access categories according to their particular needs, but all from a 
single common origin: the DT. This scheme not only facilitates the flow of data and infor- 
mation about production planning among actors, but also creates a coordination channel 
for the zero-defect strategy in the SC as it makes it possible to: (i) enable collaborative 
manufacturing with the DT as a means of sharing data and information about processes 
and resources; (ii) for each involved stakeholder, monitor the MPS process parameters 
that need to be shared in this collaborative manufacturing context to improve early defect 
detection, or even prediction, as a way to empower prevention policies and to, thus, better 
cope with disturbing or disruptive events and their subsequent recovery; (iii) enhance 
data storage, analysis and visualization by unifying these performances through the DT; 
(iv) quickly reconfigure and reorganize the MPS whenever necessary in a coordinated 
manner by gaining efficiency and saving idle times for this reason; and (v) collaboratively 
launch real-time production rescheduling across the entire SC, which is generated and 
spread by the DT. In a nutshell: (i) collaborative manufacturing; (ii) process monitor- 
ing; (iii) data management enhancement; (iv) reconfiguration and reorganization; and 
(v) real-time rescheduling ability, i.e., five of the seven system areas—which also include 
continuous quality control and online predictive maintenance—formulated by Lindström 
et al. [16] in their model for ZDM would be collected and considered within this framework 
to favor a zero-defect goal in the SC and, in this specific manner, to understand MPS 
processes and fight against process failures to minimize, mitigate and eliminate possible 
disturbances that can potentially place the SC’s normal operation at risk and lead to higher 
resilience levels. 

The implementation of the DT for the SC smart MPS according to the described 
framework would require several stages to be extended throughout the chain as a whole. 
Nevertheless, the first and most important one is to develop the manufacturer’s specific 
domain, where the backend is located as the core of the DT, before extending it to the 
scope of the other actors involved in the SC. The basic infrastructure and processes in this 
restricted DT space are described in the two following subsections of the proposal. 


3.3. Integrating the Physical and Virtual Environments of the DRL-Based DT 


The DRL-based DT is configured within this framework as a set of overlapping 
and interrelated layers, where each individual layer demarcates a defined part of the DT 
environment (Figure 3). This setup is partially based on the research by Serrano et al. [88] 
for a smart DT for ZDM-based job-shop scheduling. 

All these DT layers or elements act as a receiver, processor and/or generator of data 
and information, depending on the characteristics of the role played in the DT. The physical 
environment of the DT groups the following five elements: (i) the hardware and software 
making up the DT frontend interface and backend processing core; (ii) the hardware and 
software for storing the dataset in the cloud; (iii) the HoT; (iv) cyber-physical systems (CPS) 
distributed throughout the SC physical environment; and (v) information captured locally 
on the current state of production and resources that is relevant to the MPS. Regarding the 
virtual environment, it groups the following elements: (i) demand forecasts and the current 
status of customer ordering, dynamically updated in real time; (ii) the DRL agent; (iii) 
the master scheduling policy; (iv) the simulation environment for agent training; (v) the 
accumulated training data; and (vi) the set of actions taken by the agent on the MPS. 
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Figure 3. Integration of the physical and virtual environments of the DRL-based DT. 


All these elements are synchronized and constitute a single cohesive environment in 
the DT. 


3.4. Description of the DRL-Based Agent’s Learning and Prescription Processes 


Both processes are based on the DRL method [69], and are basically developed by 
two elements, the training environment and the DRL agent (Figure 4), to be implemented 
into a DRL framework based on the Python code with the help of its specialized open 
source libraries. 

The training environment is the MPS modeled as an MDP in such a way that it is 
made up of: (i) an observation space; (ii) an action space; (iii) an initial state; (iv) the state 
transition function. The observation space specifies which are the variables of the MPS 
problem and delimits the boundaries between, which may vary each period. The action 
space determines the variety of actions that can be decided about the MPS problem and to 
what extent. The initial state represents the MPS state during the first period considered 
in the MPS training cycle, and is defined by the value taken by the MPS variables in the 
observation space during the initial period. Finally, the state transition function defines 
what varies, and to what extent, between one state and the next after an agent action is 
applied in the valid action space. This environment to be implemented into the Python 
code with the Open AI Gym library is assisted by an ad hoc scenario generator, which can 
create synthetics problem instances that are adequately modeled to facilitate agent training. 
The training process can also be assisted with stored historical MPS data modeled as MDP 
requirements if data are available and if deemed necessary or convenient. 
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Figure 4. Setup of the DRL-based DT-driven MPS. 


In the training stage, the DRL agent must play its role in the arena shaped by the 
above-described environment. From the initial state prepared by the scenario generator, 
the agent essentially acts in the environment by triggering an advance toward a new state 
for the next period. The environment grants the agent a reward for this step, whose value 
essentially depends on how much the new state improves the MPS. With this reward and 
the new MPS state, the agent performs a new action that depends on the selected type of 
DRL method, i.e., value-based, policy-based, hybrid methods such as actor-critics, among 
others, which lead to a new state and a new reward, and so on, period after period, to 
complete a planning cycle. These training cycles are repeatedly performed as often as 
necessary until the agent’s throughput evaluation exceeds a certain threshold, or the DRL 
algorithm is changed by not exceeding the threshold after a predetermined number of 
cycles. Finally, when training is evaluated as satisfactory and the DRL agent is considered 
sufficiently trained, the latter is prepared to interact with the real environment—which, 
unlike the training environment, is dynamic and continuous—and, from this, new MPS 
states are prescribed. 

The DRL agent can be a Python algorithm to be implemented with the RLLib via Ray 
library and Tensorflow, specifically designed to interact in the above-described training 
environment with basically two operation modes: training and prescription. When training, 
the DRL agent collects the current MPS state and predicts a new state for the next period, 
and so on, until all the periods of a complete training MPS cycle have been completed. The 
agent’s predictions are based on a learned methodology from synthetic or real data, which 
depend on the DRL methodology selected from those existing in the RLLib library and the 
adjustment of its hyperparameters. This library, which includes the most basic versions, 
those of the policy-based or value-based type, mainly collects the most usual hybrid- 
based DRL methodologies, such as actor-critic or gradient-based methods; e.g., policy 
gradients (PG), soft actor critics (SAC), advantage actor-critic (A2C, A3C), or proximal 
policy optimization (PPO), and some high-performance architectures such as asynchronous 
proximal policy optimization (APPO). DRL algorithm selection relies on an additional 
module attached to the agent that evaluates its performance during training and has the 
capacity to modify: (i) the agent’s number of training cycles, also called epochs; (ii) the 
DRL algorithm type depending on the result of evaluations; and (iii) the adjustment of 
certain basic hyperparameters that varies according to the selected DRL algorithm. 
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3.5. Proposal Summary 


In summary: (i) the proposed DT is conceived as a DSS implemented by the manufac- 
turer and partially shared with suppliers, warehousers, retailers and, depending on the 
case, customers by means of a cloud-computing system; (ii) from all these SC stakehold- 
ers, the DT receives the data and information about the processes and resources that are 
properly modeled as a DRL instance; (iii) when the DRL agent is trained, the DT processes 
the MPS problem automatically and autonomously in real time based on the DRL method; 
(iv) the DT provides a permanently optimized MPS in the event of any change in input as 
output, but respects the committed ordering policy on the fixed demand horizon, if any; 
(v) the DT allows the manufacturer to transmit changes to lower planning levels without 
delays, such as MRP, CRP or inventory control; (vi) diverts a master supply schedule to 
suppliers at their different tiers for their own planning; (vii) diverts available products to 
promise per period to warehousers, retailers and, depending on the case, to customers; and 
(viii) delimits the data and information of each actor depending on its role. 


4. Discussion 


The MPS plays a crucial role in the SC and has been a sustained driver of research into 
new planning methodologies, which has provided continuous scientific development and 
generated new models with a wide range of approaches. However, in today’s dynamic 
environment, the growing scale and complexity of global SCs and the new technological 
developments occurring at an ever-increasing speed mean that knowledge gaps persistently 
appear. In the case at hand, the aim of this paper is to respond to the lack of contributions 
detected in the literature on the joint use of the ZDM management model and the ML-based 
DT enabling technology to pursue smart master planning to, thus, contribute to a resilient 
and sustainable SC. 

On the mechanisms that lead one of the 14.0 enabling technologies par excellence, such 
as the DT, to constitute a competent tool to enable the MPS to achieve higher automation, 
autonomy and real-time action capacity levels, it can be stated that the DT is a system 
that combines physical entities—in our case, the data and information about the real mas- 
ter planning environment—with their virtual counterparts—the virtual MPS—by taking 
advantage of the benefits of virtual and physical environments to benefit the whole sys- 
tem [11]. The DT captures information from the physical entity, which it stores, processes, 
analyzes and evaluates so that the knowledge generated after these operations can be subse- 
quently applied to not only current physical entities, but also to future ones [11], and all this 
without localization restrictions given its ability to enable shared virtual spaces where data 
and information about systems become more visible [12,13] and, thus, enable collaborative 
production scenarios. Relating the implementation of this technology in the literature into 
the digital transformation of processes from the perspective of its automation [89] and 
its endowment with higher autonomy levels is commonplace [90]. Moreover, the DT’s 
potential to enable real-time management is a recurrent research topic in the logistics and 
industrial field in general [36], but also in the area of production planning and control in 
particular [37], especially when assisted by artificial intelligence [7]. Not many examples 
appear in the literature that show the benefits of the DT in the specific MPS field [18], but 
they can be found in many other SC fields, such as real-time monitoring and control [62], 
risk management [13], recovery from disruption [65], SCs’ resilience to disruption [66], 
planning verifications related to demand forecasting, aggregate planning, and inventory 
planning [10]. One limitation of this technology is that the existing commercial solutions 
on the market currently have relatively high acquisition and maintenance costs, and need 
to be handled by qualified personnel. However, the possibility of implementing ad hoc 
solutions with open source tools has increased significantly since this technology began to 
make its way in the early part of the last decade. 

Regarding ML and its ability to cope with NP-hard computational complexity levels, 
once again it is true that, in the production planning and control area, the academic commu- 
nity has chosen to address mostly the application of ML methods in process problems other 
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than the MPS, i.e., at the tactical decision level, mainly in inventory control and supplier 
selection problems, and at the operational decision level, in the various configurations of 
the job scheduling problem [18]. It is important to emphasize that these problems share 
the possibility of being modeled as MDP with the MPS, which would a priori allow the 
application of the reinforcement learning methodology with similar guarantees of success 
in the MPS as in other problems. However, it must be assumed that the complex struc- 
ture of current SCs, especially global ones with many stages and nodes, the number of 
variables included in the modeled problem and its intrinsically stochastic condition imply 
that the modeling of real cases with the reinforcement learning methodology, but without 
the additional assistance of other methods, constitutes a considerable challenge. Only 
through the gradual incorporation of the DRL methodology [69], a combination of the 
reinforcement learning methodology with deep learning—another ML methodology that 
uses artificial neural networks to transform a set of inputs into a set of outputs, that solve 
tasks that involve handling complex and high-dimensional raw input data sets [91]—has it 
been possible to begin to consider the study of SCs with certain complexity, e.g.,: (i) the 
multistage SC problem of Alves and Mateus [67], validated with a four-stage SC scenario 
and two nodes per stage, local inventories, lead time, a single product, and demand uncer- 
tainty; (ii) the capacitated SC problem of Peng et al. [68], validated with a three-stage SC 
scenario, one node in the first, two in the second and three in the last stage, capacitated 
production, independent, stochastic and seasonal demand, and a single product; (iii) the 
case of Meisheri et al. [92] who, despite restricting the validation of their retailers’ inventory 
replenishment to the last SC layers, i.e., warehouse and retailer, considers the existence of 
product variety, with instances of 100 and 220 products—to substantially increase combina- 
torial computation—and incorporates lead time, limited storage capacity, cross-product 
restrictions, and weight and volume transportation restrictions. Computational limitations 
in this regard are manifested as the size of the problem to be solved in terms of the size 
of the input dataset, and especially the size of the modeled problem’s observation space. 
Nevertheless, advances in the DRL methodology are continuous and new implementations 
with meta-learning, contextual bandits or high-performance architectures, among others, 
frequently appear, whose application in the SC planning field is yet to be explored as the 
most advanced implementations in the related literature do not go beyond gradient-based 
methods, such as PPO, advantage actor-critic (A2C), or even the basic DON. 

The ZDM management model is often associated with 14.0 for presenting largely 
compatible objectives and providing synergistic and complementary approaches [17,93,94]. 
Beyond the most well-known ZDM objectives, such as minimizing failures and defects 
and their early online detection, this management model shares with 14.0 the purpose 
of minimizing production costs and making production more efficient and sustainable 
by reducing the number of failures, breakdowns and defective parts [95]. Although the 
ZDM model does not fully appear in the literature about SCs and the MPS, it should 
be emphasized that the effects of meeting its objectives entail certain benefits for SCs 
whose discussion is of interest in the present research: (i) minimization of defects on line, 
regardless of them being failures, breakdowns or defective quality parts, is a factor that, in 
turn, favors the minimization of the disturbances that usually affect the system [7,15,16,96]. 
Thus it is beneficial action for the automation of processes; and (ii) the sustainability of 
SCs is favored by the minimization and mitigation of defects in two of its dimensions, 
economic and environmental, because achieving the ZDM strategy favors the reduction 
of costs, but also the reduction of emissions and energy, and raw material use [17,94]. It 
should also be noted that ZDM and resilience are related in the literature [97] as they 
share some significant points. The path toward higher resilience levels in SC involves 
promoting both properties that reduce the vulnerability of the SC to disruptive events and 
those that reduce its recovery time. ZDM has the dual potential to improve both groups of 
properties as manufacturing without failures or defects is robust, persistent and, therefore, 
less vulnerable manufacturing, but also allows faster recovery after disruptions because 
it is more agile and adaptable. As for the relation between ZDM and sustainability, it is 
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remarkable how the research of Psarommatis et al. [19] establishes such a direct relation, 
and the word sustainability plays a leading role in the very definition of ZDM provided in 
this paper: “ZDM offers a holistic approach, aiming at greater manufacturing sustainability, 
which ensures both process and product quality by reducing product defects through 
the use of corrective, preventive, and predictive techniques made possible by data-based 
technologies, and guarantees that no defective products leave the production site and 
reach the customer”. However, the ZDM model has some restrictions that should be 
mentioned. As a quality improvement (QI) method, ZDM differs from traditional methods, 
such as lean manufacturing, six sigma or total quality management (TQM) because, while 
traditional methods use historical data to improve the future without considering the 
current production status, ZDM employs both historical and current data, essential for 
tracing the cause of the defect and to learn from the event. This advantage of ZDM lies in 
its negative counterpart insofar as it requires intensive real-time data use, without which 
the model’s efficiency is compromised [19]. 

Thus it seems reasonable to think that the DT technology, the ML method and the 
ZDM model applied in the MPS are aligned individually with a smart MPS model that 
contributes to a more resilient and sustainable SC. However, this alignment is reinforced in 
the triple combination of the DI-ML-ZDM scheme given the cross synergies among them, 
where the following stand out: (i) the DT technology is favored by the ML method because 
it enables the real-time prescription of solutions to the MPS problem in high-dimensional 
problems, a field in which traditional methods such as analytics, simulation and heuristics 
are limited; (ii) the DT technology is favored by the ZDM model because it mitigates 
the disturbances number and magnitude on the system, which favors its automation, 
including DT functions; (iii) the ML method is favored by the DT technology because 
the virtualization of the real environment allows the ML agent to act on it only when it 
is positively evaluated and is, therefore, able to prescribe after training, which confers 
planning robustness; (iv) for the same reason, the ZDM model is favored by DT technology 
because the fact that the ML agent only acts when it is trained favors the reduction and 
mitigation of errors and, thus, the elimination of defects; and, finally, (v) the ZDM model is 
favored by applying the ML method because the latter favors the necessary real-time data 
feeding of the former and can, thus, properly carry out its function. The potential benefits 
are, therefore, significant. Yet the other side of the coin is marked by the possible barriers 
and risks associated with implementing the DI-ML-ZDM scheme, among which, and 
according to the research of Miiller et al. [77], those associated with are: (i) suppliers and 
SC partners, e.g., their critical attitude toward changes, or rejection of data transparency; 
(ii) organization and implementation, e.g., the amount of investment required, or lack of 
resources or expertise; (iii) data management, e.g., data security, quality or availability; 
(iv) human aspects; e.g., the role of new employees, labor market disruptions or critical 
attitudes to change; (v) technology, e.g., its implementation procedure, overestimation of its 
benefits, use of immature systems or poor selection; finally, (vi) legal issues and standards, 
e.g., public framework conditions, standardization and business ethics. Digital integration 
with customers is also an aspect to consider in this regard given the positive influence 
it will have on SC management and performance, as indicated by Queiroz et al. [78]. 
Thus an effective MPS digital transformation process according to the herein proposed 
scheme must take into account all these challenges and risks beyond the simple synergistic 
implementation of DI-ML-ZDM. 

That said, it is also worth mentioning that from a practical perspective, MPS robustness 
lies on several fundamental pillars, of which the following are highlighted: (i) the accuracy 
of demand forecasts; (ii) the consideration of realistic constraints and deadlines; (iii) the 
use of accurate calculation methods; (iv) the flexibility to synchronize with the evolution 
of demand patterns; (v) a fluid movement of information between agents and areas; and 
(vi) the involved parties’ acceptance. Thus the existence of instruments that provide 
this structure with support by facilitating SC systems and processes becoming visible 
to the agents involved in it, in their different areas and at their distinct decision levels, 
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collaborative interaction capacity, wide data access, simulation and off-line analysis power, 
and real-time action capacity, can all be key to minimize, mitigate and/or eliminate failures 
and defects, and to reinforce SC resilience and sustainability. Therefore, it is considered that 
in the particular MPS context, virtualization by means of the DT, the intelligence imbued 
in decision-making processes with ML assistance, and the stable fluency of processes in 
line with the zero-defect philosophy have the capacity to play a significant role in the 
smart MPS. 


5. Conclusions 


This paper proposes an initial DT-based conceptual framework to model, optimize 
and prescribe the MPS in an SC ina ZDM context. This framework focuses on developing 
optimization algorithms to solve the MPS problem in the specific described environment 
based on digital twinning with the support of DRL techniques. 

The proposed DT-based model, designed to accommodate the set of stakeholders in 
the SC, along with their real and virtual processes and resources in two different planes, is 
described. The DRL-based DT-driven MPS setup is also presented. 

Both the described framework and its configuration are considered a first contribution 
of this research. Its design aims to improve SC performance by reinforcing its digitization, 
intelligence, visibility, interconnectedness and organization, which all take the SC toward 
higher resilience and sustainability levels; a goal for any traditional SC that intends to be 
transformed into SC4.0. The DT technology is distinguished by the potential to simulta- 
neously and positively influence all these aspects because: (i) digitization is an intrinsic 
property of a DT; (ii) although the commonest purpose of a DT is to simulate, analyze, 
predict or optimize, this technology admits moving one step further toward the action 
of autonomously prescribing, a capability to which the attribute of intelligence can be 
attributed; (iii) a model in which the DT replicates a specific planning subject (e.g., the MPS) 
for its shared use across the entire SC has the capacity to take visibility, interconnectedness 
and organization qualities to a higher level; (iv) a more effective ZDM strategy facilitated 
by the model design contributes to not only SC resilience by minimizing, mitigating or 
eliminating potential disturbing factors, but also to more sustainability. 

The reinforcement learning approach offers certain benefits that are highlighted. 
Proper DRL-based modeling that bridges the exploration-exploitation dilemma in a bal- 
anced manner can help to solve the problem of correlating immediate planning actions with 
their long-term consequences. In addition, and unlike analytical or heuristic approaches, 
the DRL-based modeling approach provides an acceptable solution in real-world envi- 
ronments, such as manufacturing, for those problems in which feedback is often subject 
to time delays, provided that these problems can be characterized as Markov decision 
processes (MDP), which is the case of the MPS problem. It is also shown that the DRL 
method is an effective tool for dealing with problems whose solution with analytical or 
heuristic approaches is harder due to implicit computational complexity. 

This proposal has some limitations. The model does not foresee the inclusion of 
financial considerations. Moreover, the possibility of putting at risk with unwise decisions 
the economic value of the resources involved in the MPS by the actors involved in the SC 
means that it is advisable to restrict the DT’s prescriptive action in a first stage so that the 
final MPS confirmation depends on the human operator. This recommendation would 
continue to be advisable until the system’s reliability is properly verified. 

Regarding research perspectives, this conceptual framework has to be considered an 
initial starting point and roadmap for modeling, applications and empirical validations in 
a real-world SC MPS case study. It is also necessary to study if the modeling approach can 
be extended to other planning levels, such as MRP or inbound and outbound logistics, and 
under what conditions this would be possible. 

Although the proposed conceptual framework accommodates all the intervening 
actors in the SC, developing the model beyond the manufacturer and its suppliers at the 
two closest tiers is challenging and opens up a supplementary research line. The same 
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conclusion is reached for the task of incorporating additional supplier tiers into the previous 
two, plus logistics warehousers, wholesale distributors, retailers and, finally, customers. 

A better understanding of the relevance of the human factor in SC4.0 and its planning 
would also be a topic for further research. The European Union’s Industry 5.0 initiative falls 
in line with this, and it is worth mentioning the desirability of further research into the role 
that humans should play in environments where not only the most physically demanding 
or risky tasks are being transferred from humans to systems, but also the responsibility for 
decision making. 

Lastly, the described conceptual framework, and the technical background behind the 
proposed DT, can be adapted to other novel alternative tactical planning frameworks, such 
as adaptive sales and operations planning (AS&OP) that derive from the demand-driven 
adaptive enterprise (DDAE) model by substituting the MPS subject for other different ones; 
e.g., replenishment of items in the buffers identified at the tactical level. Even by being 
formulated as an MDP, it can also be modeled as a nonlinear, stochastic and/or fuzzy 
problem to face uncertainty, which would be a promising future research line. 
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